Imagine there are \(N\) objects (geographical locations, characteristics of people, etc.) In some cases, it is hard to depict each of these \(N\) objects explicitly with a vector \(\vec{y} \in \mathbb{R}^{N}\), however, we can measure the degree of “closeness” (proximity) between these \(N\) objects. In other words, we can obtain a \(N \times N\) matrix \(\Delta\), with \(\Delta_{rs}\) representing the distance/dissimilarity between objects \(r\) and \(s\). Assume \(\Delta_{rr}=0\) and \(\Delta_{sr} = \Delta_{rs}\). \(\Delta\) is often called a Proximity/Distance/Dissimilarity Matrix.
Given the Proximity Matrix \(\Delta\), MDS (Multi-dimensional Scaling) comes to place when it comes to recovering the original/latent individual configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N \in \mathbb{R}^{t}\).
The primary goal of MDS is to reconstruct the original configuration as close as possible, say, find points \(\tilde{y}_1, \tilde{y}_2, \dots, \tilde{y}_N \in \mathbb{R}^{t^{\prime}}\) s.t. the distance between \(\tilde{y}_r\) and \(\tilde{y}_s\) is close to \(\Delta_{rs}\). Sometimes, the true dimension \(t\) is unknown, so determining the number of dimensions is also a major problem to be solved.
MDS can also serve as a data visualization method. Usually, \(t^{\prime}\) is set to be either two or three, and the \(N\) points in the two-dimensional (or three-dimensional) plot can show an obvious sign of clustering, where points in a particular cluster are viewed as being “closer” to the other points in that cluster than to points in other clusters.
Here we use the sphere distances of multiple large cities as an example to illustrate the power of MDS in recovering the original expressions and visualizing them.
The eurodist R package provides road distance between 21 cities in Europe and 9 cities in the US. For the moment, we will focus on the 10 US cities with names and distances given in the following tables
| Atlanta | Chicago | Denver | Houston | LosAngeles | Miami | NewYork | SanFrancisco | Seattle | Washington.DC | |
|---|---|---|---|---|---|---|---|---|---|---|
| Atlanta | 0 | 587 | 1212 | 701 | 1936 | 604 | 748 | 2139 | 2182 | 543 |
| Chicago | 587 | 0 | 920 | 940 | 1745 | 1188 | 713 | 1858 | 1737 | 597 |
| Denver | 1212 | 920 | 0 | 879 | 831 | 1726 | 1631 | 949 | 1021 | 1494 |
| Houston | 701 | 940 | 879 | 0 | 1374 | 968 | 1420 | 1645 | 1891 | 1220 |
| LosAngeles | 1936 | 1745 | 831 | 1374 | 0 | 2339 | 2451 | 347 | 959 | 2300 |
| Miami | 604 | 1188 | 1726 | 968 | 2339 | 0 | 1092 | 2594 | 2734 | 923 |
| NewYork | 748 | 713 | 1631 | 1420 | 2451 | 1092 | 0 | 2571 | 2408 | 205 |
| SanFrancisco | 2139 | 1858 | 949 | 1645 | 347 | 2594 | 2571 | 0 | 678 | 2442 |
| Seattle | 2182 | 1737 | 1021 | 1891 | 959 | 2734 | 2408 | 678 | 0 | 2329 |
| Washington.DC | 543 | 597 | 1494 | 1220 | 2300 | 923 | 205 | 2442 | 2329 | 0 |
After conducting classical MDS (a method of MDS, details will be discussed in proceeding parts), we acquire the plot below. The plot complies to our geographical knowledge. Miami and Seattle are the farthest from our plot, and these two cities are actually thousands of miles away (Seattle in the north-west end of US mainland and Miami in the south-east corner). NY and D.C. are quite close on the plot, same for LA and San Francisco.
Actually, if we rotate the above plot 180 degrees, a US map appears. It
is amazing!
Again, let’s try MDS on the 21 European cities. We make some flipping on
the map to make it comply to the true European map.
| Athens | Barcelona | Brussels | Calais | Cherbourg | Cologne | Copenhagen | Geneva | Gibraltar | Hamburg | Hook of Holland | Lisbon | Lyons | Madrid | Marseilles | Milan | Munich | Paris | Rome | Stockholm | Vienna | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Athens | 0 | 3313 | 2963 | 3175 | 3339 | 2762 | 3276 | 2610 | 4485 | 2977 | 3030 | 4532 | 2753 | 3949 | 2865 | 2282 | 2179 | 3000 | 817 | 3927 | 1991 |
| Barcelona | 3313 | 0 | 1318 | 1326 | 1294 | 1498 | 2218 | 803 | 1172 | 2018 | 1490 | 1305 | 645 | 636 | 521 | 1014 | 1365 | 1033 | 1460 | 2868 | 1802 |
| Brussels | 2963 | 1318 | 0 | 204 | 583 | 206 | 966 | 677 | 2256 | 597 | 172 | 2084 | 690 | 1558 | 1011 | 925 | 747 | 285 | 1511 | 1616 | 1175 |
| Calais | 3175 | 1326 | 204 | 0 | 460 | 409 | 1136 | 747 | 2224 | 714 | 330 | 2052 | 739 | 1550 | 1059 | 1077 | 977 | 280 | 1662 | 1786 | 1381 |
| Cherbourg | 3339 | 1294 | 583 | 460 | 0 | 785 | 1545 | 853 | 2047 | 1115 | 731 | 1827 | 789 | 1347 | 1101 | 1209 | 1160 | 340 | 1794 | 2196 | 1588 |
| Cologne | 2762 | 1498 | 206 | 409 | 785 | 0 | 760 | 1662 | 2436 | 460 | 269 | 2290 | 714 | 1764 | 1035 | 911 | 583 | 465 | 1497 | 1403 | 937 |
| Copenhagen | 3276 | 2218 | 966 | 1136 | 1545 | 760 | 0 | 1418 | 3196 | 460 | 269 | 2971 | 1458 | 2498 | 1778 | 1537 | 1104 | 1176 | 2050 | 650 | 1455 |
| Geneva | 2610 | 803 | 677 | 747 | 853 | 1662 | 1418 | 0 | 1975 | 1118 | 895 | 1936 | 158 | 1439 | 425 | 328 | 591 | 513 | 995 | 2068 | 1019 |
| Gibraltar | 4485 | 1172 | 2256 | 2224 | 2047 | 2436 | 3196 | 1975 | 0 | 2897 | 2428 | 676 | 1817 | 698 | 1693 | 2185 | 2565 | 1971 | 2631 | 3886 | 2974 |
| Hamburg | 2977 | 2018 | 597 | 714 | 1115 | 460 | 460 | 1118 | 2897 | 0 | 550 | 2671 | 1159 | 2198 | 1479 | 1238 | 805 | 877 | 1751 | 949 | 1155 |
| Hook of Holland | 3030 | 1490 | 172 | 330 | 731 | 269 | 269 | 895 | 2428 | 550 | 0 | 2280 | 863 | 1730 | 1183 | 1098 | 851 | 457 | 1683 | 1500 | 1205 |
| Lisbon | 4532 | 1305 | 2084 | 2052 | 1827 | 2290 | 2971 | 1936 | 676 | 2671 | 2280 | 0 | 1178 | 668 | 1762 | 2250 | 2507 | 1799 | 2700 | 3231 | 2937 |
| Lyons | 2753 | 645 | 690 | 739 | 789 | 714 | 1458 | 158 | 1817 | 1159 | 863 | 1178 | 0 | 1281 | 320 | 328 | 724 | 471 | 1048 | 2108 | 1157 |
| Madrid | 3949 | 636 | 1558 | 1550 | 1347 | 1764 | 2498 | 1439 | 698 | 2198 | 1730 | 668 | 1281 | 0 | 1157 | 1724 | 2010 | 1273 | 2097 | 3188 | 2409 |
| Marseilles | 2865 | 521 | 1011 | 1059 | 1101 | 1035 | 1778 | 425 | 1693 | 1479 | 1183 | 1762 | 320 | 1157 | 0 | 618 | 1109 | 792 | 1011 | 2428 | 1363 |
| Milan | 2282 | 1014 | 925 | 1077 | 1209 | 911 | 1537 | 328 | 2185 | 1238 | 1098 | 2250 | 328 | 1724 | 618 | 0 | 331 | 856 | 586 | 2187 | 898 |
| Munich | 2179 | 1365 | 747 | 977 | 1160 | 583 | 1104 | 591 | 2565 | 805 | 851 | 2507 | 724 | 2010 | 1109 | 331 | 0 | 821 | 946 | 1754 | 428 |
| Paris | 3000 | 1033 | 285 | 280 | 340 | 465 | 1176 | 513 | 1971 | 877 | 457 | 1799 | 471 | 1273 | 792 | 856 | 821 | 0 | 1476 | 1827 | 1249 |
| Rome | 817 | 1460 | 1511 | 1662 | 1794 | 1497 | 2050 | 995 | 2631 | 1751 | 1683 | 2700 | 1048 | 2097 | 1011 | 586 | 946 | 1476 | 0 | 2707 | 1209 |
| Stockholm | 3927 | 2868 | 1616 | 1786 | 2196 | 1403 | 650 | 2068 | 3886 | 949 | 1500 | 3231 | 2108 | 3188 | 2428 | 2187 | 1754 | 1827 | 2707 | 0 | 2105 |
| Vienna | 1991 | 1802 | 1175 | 1381 | 1588 | 937 | 1455 | 1019 | 2974 | 1155 | 1205 | 2937 | 1157 | 2409 | 1363 | 898 | 428 | 1249 | 1209 | 2105 | 0 |
The above plot reconstructs the European map quite well. Gibraltar, Lisbon and Madrid are in the south-west corner, the two North European cities Stockholm and Copenhagen are in the north end, and Athens is in the south-west corner.
Finally, let’s consider the 18 representative global cities. Their pairwise flight lengths (geodesic distance) are shown in the table below. As we can see, the geodesic distances between the three Southern-Hemisphere cities: Rio, Cape Town, Melbourne and other Northern-Hemisphere cities are generally large (almost all over 10,000 kilometers)
| Beijing | Cape Town | Hong Kong | Honolulu | London | Melbourne | Mexico City | Montreal | Moscow | New Delhi | New York | Paris | Rio | Rome | S.F. | Singapore | Stockholm | Tokyo | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Beijing | 0 | 12947 | 1972 | 8171 | 8160 | 9093 | 12478 | 10490 | 5809 | 2788 | 11012 | 8236 | 17325 | 8144 | 9524 | 4465 | 6725 | 2104 |
| Cape Town | 12947 | 0 | 11867 | 18562 | 9635 | 10388 | 13703 | 12744 | 10101 | 9284 | 12551 | 9307 | 6075 | 8417 | 16487 | 9671 | 10334 | 14737 |
| Hong Kong | 1972 | 11867 | 0 | 8945 | 9646 | 7392 | 14155 | 12462 | 7158 | 3770 | 12984 | 9650 | 17710 | 9300 | 11121 | 2575 | 8243 | 2893 |
| Honolulu | 8171 | 18562 | 8945 | 0 | 11653 | 8862 | 6098 | 7915 | 11342 | 11930 | 7996 | 11988 | 13343 | 12936 | 3857 | 10824 | 11059 | 6208 |
| London | 8160 | 9635 | 9646 | 11653 | 0 | 16902 | 8947 | 5240 | 2506 | 6724 | 5586 | 341 | 9254 | 1434 | 8640 | 10860 | 1436 | 9585 |
| Melbourne | 9093 | 10388 | 7392 | 8862 | 16902 | 0 | 13557 | 16730 | 14418 | 10192 | 16671 | 16793 | 13227 | 15987 | 12644 | 6050 | 15593 | 8159 |
| Mexico City | 12478 | 13703 | 14155 | 6098 | 8947 | 13557 | 0 | 3728 | 10740 | 14679 | 3362 | 9213 | 7669 | 10260 | 3038 | 16623 | 9603 | 11319 |
| Montreal | 10490 | 12744 | 12462 | 7915 | 5240 | 16730 | 3728 | 0 | 7077 | 11286 | 533 | 5522 | 8175 | 6601 | 4092 | 14816 | 5900 | 10409 |
| Moscow | 5809 | 10101 | 7158 | 11342 | 2506 | 14418 | 10740 | 7077 | 0 | 4349 | 7530 | 2492 | 11529 | 2378 | 9469 | 8426 | 1231 | 7502 |
| New Delhi | 2788 | 9284 | 3770 | 11930 | 6724 | 10192 | 14679 | 11286 | 4349 | 0 | 11779 | 6601 | 14080 | 5929 | 12380 | 4142 | 5579 | 5857 |
| New York | 11012 | 12551 | 12984 | 7996 | 5586 | 16671 | 3362 | 533 | 7530 | 11779 | 0 | 5851 | 7729 | 6907 | 4140 | 15349 | 6336 | 10870 |
| Paris | 8236 | 9307 | 9650 | 11988 | 341 | 16793 | 9213 | 5522 | 2492 | 6601 | 5851 | 0 | 9146 | 1108 | 8975 | 10743 | 1546 | 9738 |
| Rio | 17325 | 6075 | 17710 | 13343 | 9254 | 13227 | 7669 | 8175 | 11529 | 14080 | 7729 | 9146 | 0 | 9181 | 10647 | 15740 | 10682 | 18557 |
| Rome | 8144 | 8417 | 9300 | 12936 | 1434 | 15987 | 10260 | 6601 | 2378 | 5929 | 6907 | 1108 | 9181 | 0 | 10071 | 10030 | 1977 | 9881 |
| S.F. | 9524 | 16487 | 11121 | 3857 | 8640 | 12644 | 3038 | 4092 | 9469 | 12380 | 4140 | 8975 | 10647 | 10071 | 0 | 13598 | 8644 | 8284 |
| Singapore | 4465 | 9671 | 2575 | 10824 | 10860 | 6050 | 16623 | 14816 | 8426 | 4142 | 15349 | 10743 | 15740 | 10030 | 13598 | 0 | 9646 | 5317 |
| Stockholm | 6725 | 10334 | 8243 | 11059 | 1436 | 15593 | 9603 | 5900 | 1231 | 5579 | 6336 | 1546 | 10682 | 1977 | 8644 | 9646 | 0 | 8193 |
| Tokyo | 2104 | 14737 | 2893 | 6208 | 9585 | 8159 | 11319 | 10409 | 7502 | 5857 | 10870 | 9738 | 18557 | 9881 | 8284 | 5317 | 8193 | 0 |
The three-dimensional visualization result of MDS is shown above. You can rotate and magnify it on your laptop. The blue points represent Asian cities, the black points represent European cities, and the red points represent North American cities. They are clustered together within the group. If you try to rotate the plot and make these three clusters in the bottom, you will find the three separate cities: Rio, Melbourne and Cape Town on the top of the plot. This actually represents that they are the only cities in Southern-Hemisphere. If you inspect the plot clearly, you will find that these cities actually locate on a sphere, which complies to the true scenario. MDS is quite successful in recovering their relative locations!
Besides showing the capacity of MDS in recovering original expressions and visualization, we also gain two by-products from the examples above.
Classical Scaling and Metric MDS generally require that the input data is a true distance metric, while Non-metric MDS is usually used when the input data doesn’t satisfy the properties of a true distance metric or when the relationships are ordinal (i.e., we only know which distances are larger, but not by how much).
To add, Classical MDS operates by applying an eigen-decomposition to the “double-centered” dissimilarity matrix, while Metric MDS and Non-Metric MDS may employ iterative optimization methods to better accommodate non-linear relationships in data structure.
Before going deep into Classical Scaling, we first need to introduce some important definitions.
\(\Delta\) is symmetric, meaning that:
\[\Delta_{rs} = \Delta_{sr} \quad \text{for all } r \text{ and } s\]
This implies that the distance between points \(r\) and \(s\) is the same as the distance between points \(s\) and \(r\).
The diagonal entries of the matrix represent the distance of a point to itself, and are thus all zeros:
\[\Delta_{rr} \equiv 0 \quad \text{for all } 1 \leq r \leq N\]
All distances are non-negative:
\[\Delta_{rs} \geq 0\]
The matrix respects the triangle inequality:
\[\Delta_{rs} \leq \Delta_{rt} + \Delta_{ts}\]
This property ensures that the direct distance between two points \(r\) and \(s\) is never greater than the sum of the distances from \(r\) to a third point \(t\) and from \(t\) to \(s\).
For configuration \(\vec{y}_i, \vec{y}_j \in \mathbb{R}^t\), the Euclidean distance between point \(r\) and point \(s\) is defined as \(\delta_{ij}=||\vec{y}_i - \vec{y}_j||= \left\{\sum_{k=1}^t \left(y_{i k}-x_{j k}\right)^2\right\}^{1/2}\). Remember to take the square root
Strictly speaking, we require that the Proximity Matrix \(\Delta\) is an Euclidean distance matrix.
Furthermore, it is assumed that the configurations \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\) are centered. In other words, \(\sum_{i=1}^N \vec{y}_i = \vec{0}\). This step eliminates the influence of location.
It may sound difficult to recover the configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\) from merely the Euclidean distance matrix \(\Delta\) at the first glance. However, the situation becomes clearer after some analysis.
\(\Delta^2_{rs}\) can be expressed as \(\Delta^2_{rs} = ||\vec{y}_r||^2 + ||\vec{y}_s||^2 - 2 \vec{y}_r^T \vec{y}_s\). It’s not difficult to observe that both terms are both very easy to be expressed with the original configuration: \(Y = \left(\begin{array}{c} \vec{y}_1^{\top} \\ \vec{y}_2^{\top} \\ \vdots \\ \dot{y}_N^{\top} \end{array}\right)\).
The entries of the inner product matrix, i.e. \(B = YY^T \in \mathbb{R}^{N \times N}\) are able to express the Euclidean distance matrix \(\Delta\).
1) Computing the inner product matrix \(B\)
The inner product term can be represented as: \(B_{ij} = \vec{y}_r^T \vec{y}_s\), while the norms of \(\vec{y}_r\) and \(\vec{y}_s\) can be represented by the entry of \(B\), i.e. \(B_{ii} = ||\vec{y}_i||^2\). As a result, \(\Delta^2_{ij} = B_{ii} + B_{jj} - 2 B_{ij}\).
Key Observation Here, we successfully express the entries of \(\Delta\) (known) with entries of \(B\) (unknown). However, our goal is to find a way to express entries of \(B\) (unknown) with entries of \(\Delta\). Intuitively, we want to cancel out \(B_{ij}\) terms in the expression. Considering that \(\sum_{i=1}^N \vec{y}_i = \vec{0}\), we sum both sides of the equation over the index \(i\). We can get the following expression: (Because \(\sum_{i=1}^N B_{ij} = \vec{y}_j^T (\sum_{i=1}^N \vec{y}_i) = 0\)) \[\sum_{i=1}^N \Delta_{ij}^2=\operatorname{tr}(B) + N B_{ii}\]
Similarly, we can also sum both sides of the equation over index \(j\), and get the expression: \[\sum_{j=1}^N \Delta_{ij}^2=\operatorname{tr}(B) + N B_{jj}\]
We successfully eliminate all the off-diagonal terms of \(B\) through the above steps. Now, we want to take a step further. Sum both sides of the equations over both indexes \(i\) and \(j\). We acquire the following expression: \[\sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2 = 2N \operatorname{tr}(B)\]
Now we can solve the entries of \(\Delta\) using the entries of \(B\) through a backward calculation. From the last equation, we get \[\operatorname{tr}(B) = \frac{1}{2N} \sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2\]
Then substitute the above expression into above formulas, we get the expression of the diagonal entries of \(B\):
\[B_{ii} = \frac{1}{N} (\sum_{j=1}^{N} \Delta_{ij}^2 - \operatorname{tr} (B))\]
After that, we can finally get the off-diagonal entries of \(B\):
\[\begin{aligned} B_{ij} & = \frac{1}{2} (B_{ii} + B_{jj} - \Delta_{ij}^2) \\ & = -\frac{1}{2} \Delta_{ij}^2 + \frac{1}{N} \sum_{i=1}^N \Delta_{ij}^2 + \frac{1}{N} \sum_{j=1}^N \Delta_{ij}^2-\frac{1}{2 N^2} \sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2 \end{aligned}\]Actually, we may also express the inner product matrix \(B\) in a matrix form, which is what we do in real data computation. Here \(A \in \mathbb{R}^{N \times N}, A_{ij} = \Delta^2_{ij} \forall 1 \leq i,j \leq N\), \(H = \mathbb{I}_N - \frac{1}{N} \mathbb{1}_N \mathbb{1}_N^T\).
\[B = H A H\]
2) Recover the coordinates using inner product matrix \(B\)
Both diagonal and off-diagonal entries of the inner product matrix \(B\) has been shown. We assumed that \(B\) = \(YY^T\), so \(B\) is symmetric and positive semi-definite (all eigenvalues are non-negative), with \(t\) positive eigenvalues and \(N-t\) ‘zero’ eigenvalues.
Our intuition is to apply SVD to \(B\) in order to recover the configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N \in \mathbb{R}^t\).
\[\begin{align} B & = \left(\vec{u}_1\left|\vec{u}_2\right| \ldots \mid \vec{u}_t\right) \begin{pmatrix} \lambda_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda_t \end{pmatrix} \left(\begin{array}{c} \vec{u}_1^{\top} \\ \vec{u}_2^{\top} \\ \vdots \\ \vec{u}_t^{\top} \end{array}\right) \\ & = \tilde{u} \begin{pmatrix} \lambda^{1/2}_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda^{1/2}_t \end{pmatrix} \begin{pmatrix} \lambda^{1/2}_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda^{1/2}_t \end{pmatrix} \tilde{u}^T \\ & = (\tilde{u} \Lambda^{1/2}) (\tilde{u} \Lambda^{1/2})^T \end{align}\]
Let \(Y=\tilde{u} \Lambda^{1/2}\), use rows of \(\tilde{u} \Lambda^{1/2}\) as \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\). It satisfies every entry of the inner product matrix \(B\), as well as all pairwise distances \(\Delta_{ij}\).
Dealing with real data case
In real data case, sometimes the Euclidean condition is not met. As in the previous globe map example, the distance matrix actually computes the geodesic distance instead of the Euclidean distance. Under this circumstance, the “inner product” matrix \(B = HAH\) (we put a quotation mark here because \(B\) is no longer the inner product matrix) is not necssarily positive semi-definite.
Let’s put a numeric example here to illustrate this point.
Suppose we have a proximity matrix \(\Delta\):
Given the matrix: \[ \Delta = \begin{pmatrix} 0 & 1 & 3 \\ 1 & 0 & 1 \\ 3 & 1 & 0 \end{pmatrix} \]
## [,1] [,2] [,3]
## [1,] 0 1 3
## [2,] 1 0 1
## [3,] 3 1 0
Obviously, matrix \(\Delta\) does not satisfy Triangle Inequality, so it’s not an Euclidean distance matrix (even not a distance matrix actually)
## [,1] [,2] [,3]
## [1,] 0.0 -0.5 -4.5
## [2,] -0.5 0.0 -0.5
## [3,] -4.5 -0.5 0.0
## [,1] [,2] [,3]
## [1,] 2.1111111 0.2777778 -2.3888889
## [2,] 0.2777778 -0.5555556 0.2777778
## [3,] -2.3888889 0.2777778 2.1111111
## [1] 4.500000e+00 -7.771561e-16 -8.333333e-01
## [,1] [,2] [,3]
## [1,] 7.071068e-01 0.5773503 -0.4082483
## [2,] 1.110223e-16 0.5773503 0.8164966
## [3,] -7.071068e-01 0.5773503 -0.4082483
Here, we have a negative eigenvalue \(-\frac{5}{6}\). This is because the original proximity matrix \(\Delta\) is not a distance matrix.
Handle Negative Eigenvalues
non-symetric issue: There are cases where the proximity matrix \(\Delta\) is not symmetric (though it rarely happens in classical scaling scenario). Usually we set \(\Delta \leftarrow \Delta + \Delta^T\). In that way, we manually make \(\Delta\) symmetric.
When there exist some negative eigenvalues in the inner product matrix \(B\), we usually have two options to deal with it.
Inflate the original proximity matrix \(\Delta\) by a small constant factor \(c\), i.e., \(\Delta_{ij} \leftarrow \Delta_{ij} + c, \text{if} \quad i \neq j\). In this way, we can deal with the violence of Triangular Inequality.
If there exist several negative eigenvalues with small absolute value (compared to the largest several positive eigenvalues), and there are more positive eigenvalues than our prior estimation (the dimension of the original configuration), we may just pick the largest \(t\) eigenvalues and eliminate the rest.
Global City Distance case revisit
We can also consider the previous global city distance matrix example. Plot the scree plot of the inner product matrix.
We find that the first three eigenvalues are much larger than the rest, so we assume that the dimension of the original configuration is 3, which also complies to our knowledge about global map.